Goto

Collaborating Authors

 hjb equation


IsL2Physics-InformedLossAlwaysSuitablefor TrainingPhysics-InformedNeuralNetwork?

Neural Information Processing Systems

In particular, we leverage the concept of stability in the literature of partial differential equation tostudy the asymptotic behavior ofthe learned solution asthe loss approaches zero. Withthis concept, we study animportant class of high-dimensional non-linear PDEs in optimal control, the Hamilton-JacobiBellman (HJB) Equation, and provethat for generalLp Physics-Informed Loss, a wide class of HJB equation is stable only ifp is sufficiently large.



Is L 2 Physics Informed Loss Always Suitable for Training Physics Informed Neural Network?

Neural Information Processing Systems

The Physics-Informed Neural Network (PINN) approach is a new and promising way to solve partial differential equations using deep learning. The $L^2$ Physics-Informed Loss is the de-facto standard in training Physics-Informed Neural Networks. In this paper, we challenge this common practice by investigating the relationship between the loss function and the approximation quality of the learned solution. In particular, we leverage the concept of stability in the literature of partial differential equation to study the asymptotic behavior of the learned solution as the loss approaches zero. With this concept, we study an important class of high-dimensional non-linear PDEs in optimal control, the Hamilton-Jacobi-Bellman (HJB) Equation, and prove that for general $L^p$ Physics-Informed Loss, a wide class of HJB equation is stable only if $p$ is sufficiently large. Therefore, the commonly used $L^2$ loss is not suitable for training PINN on those equations, while $L^{\infty}$ loss is a better choice. Based on the theoretical insight, we develop a novel PINN training algorithm to minimize the $L^{\infty}$ loss for HJB equations which is in a similar spirit to adversarial training. The effectiveness of the proposed algorithm is empirically demonstrated through experiments.


DeepPAAC: A New Deep Galerkin Method for Principal-Agent Problems

Ludkovski, Michael, Xie, Changgen, Zhu, Zimu

arXiv.org Artificial Intelligence

We consider numerical resolution of principal-agent (PA) problems in continuous time. We formulate a generic PA model with continuous and lump payments and a multi-dimensional strategy of the agent. To tackle the resulting Hamilton-Jacobi-Bellman equation with an implicit Hamiltonian we develop a novel deep learning method: the Deep Principal-Agent Actor Critic (DeepPAAC) Actor-Critic algorithm. DeepPAAC is able to handle multi-dimensional states and controls, as well as constraints. We investigate the role of the neural network architecture, training designs, loss functions, etc. on the convergence of the solver, presenting five different case studies.


A Temporal Difference Method for Stochastic Continuous Dynamics

Settai, Haruki, Takeishi, Naoya, Yairi, Takehisa

arXiv.org Artificial Intelligence

For continuous systems modeled by dynamical equations such as ODEs and SDEs, Bellman's Principle of Optimality takes the form of the Hamilton-Jacobi-Bellman (HJB) equation, which provides the theoretical target of reinforcement learning (RL). Although recent advances in RL successfully leverage this formulation, the existing methods typically assume the underlying dynamics are known a priori because they need explicit access to the coefficient functions of dynamical equations to update the value function following the HJB equation. We address this inherent limitation of HJB-based RL; we propose a model-free approach still targeting the HJB equation and propose the corresponding temporal difference method. We establish exponential convergence of the idealized continuous-time dynamics and empirically demonstrate its potential advantages over transition-kernel-based formulations. The proposed formulation paves the way toward bridging stochastic control and model-free reinforcement learning.


Ensemble based Closed-Loop Optimal Control using Physics-Informed Neural Networks

Barry-Straume, Jostein, Verulkar, Adwait D., Sarshar, Arash, Popov, Andrey A., Sandu, Adrian

arXiv.org Artificial Intelligence

The objective of designing a control system is to steer a dynamical system with a control signal, guiding it to exhibit the desired behavior. The Hamilton-Jacobi-Bellman (HJB) partial differential equation offers a framework for optimal control system design. However, numerical solutions to this equation are computationally intensive, and analytical solutions are frequently unavailable. Knowledge-guided machine learning methodologies, such as physics-informed neural networks (PINNs), offer new alternative approaches that can alleviate the difficulties of solving the HJB equation numerically. This work presents a multistage ensemble framework to learn the optimal cost-to-go, and subsequently the corresponding optimal control signal, through the HJB equation. Prior PINN-based approaches rely on a stabilizing the HJB enforcement during training. Our framework does not use stabilizer terms and offers a means of controlling the nonlinear system, via either a singular learned control signal or an ensemble control signal policy. Success is demonstrated in closed-loop control, using both ensemble- and singular-control, of a steady-state time-invariant two-state continuous nonlinear system with an infinite time horizon, accounting of noisy, perturbed system states and varying initial conditions.


Continuous Q-Score Matching: Diffusion Guided Reinforcement Learning for Continuous-Time Control

Hua, Chengxiu, Gu, Jiawen, Tang, Yushun

arXiv.org Artificial Intelligence

Reinforcement learning (RL) has achieved significant success across a wide range of domains, however, most existing methods are formulated in discrete time. In this work, we introduce a novel RL method for continuous-time control, where stochastic differential equations govern state-action dynamics. Departing from traditional value function-based approaches, our key contribution is the characterization of continuous-time Q-functions via a martingale condition and the linking of diffusion policy scores to the action gradient of a learned continuous Q-function by the dynamic programming principle. This insight motivates Continuous Q-Score Matching (CQSM), a score-based policy improvement algorithm. Notably, our method addresses a long-standing challenge in continuous-time RL: preserving the action-evaluation capability of Q-functions without relying on time discretization. We further provide theoretical closed-form solutions for linear-quadratic (LQ) control problems within our framework. Numerical results in simulated environments demonstrate the effectiveness of our proposed method and compare it to popular baselines.



SIMPOL Model for Solving Continuous-Time Heterogeneous Agent Problems

Salguero, Ricardo Alonzo Fernández

arXiv.org Artificial Intelligence

This paper presents SIMPOL (Simplified Policy Iteration), a modular numerical framework for solving continuous-time heterogeneous agent models. The core economic problem, the optimization of consumption and savings under idiosyncratic uncertainty, is formulated as a coupled system of partial differential equations: a Hamilton-Jacobi-Bellman (HJB) equation for the agent's optimal policy and a Fokker-Planck-Kolmogorov (FPK) equation for the stationary wealth distribution. SIMPOL addresses this system using Howard's policy iteration with an *upwind* finite difference scheme that guarantees stability. A distinctive contribution is a novel consumption policy post-processing module that imposes regularity through smoothing and a projection onto an economically plausible slope band, improving convergence and model behavior. The robustness and accuracy of SIMPOL are validated through a set of integrated diagnostics, including verification of contraction in the Wasserstein-2 metric and comparison with the analytical solution of the Merton model in the no-volatility case. The framework is shown to be not only computationally efficient but also to produce solutions consistent with economic and mathematical theory, offering a reliable tool for research in quantitative macroeconomics.


Gaussian process policy iteration with additive Schwarz acceleration for forward and inverse HJB and mean field game problems

Yang, Xianjin, Zhang, Jingguo

arXiv.org Artificial Intelligence

We propose a Gaussian Process (GP)-based policy iteration framework for addressing both forward and inverse problems in Hamilton--Jacobi--Bellman (HJB) equations and mean field games (MFGs). Policy iteration is formulated as an alternating procedure between solving the value function under a fixed control policy and updating the policy based on the resulting value function. By exploiting the linear structure of GPs for function approximation, each policy evaluation step admits an explicit closed-form solution, eliminating the need for numerical optimization. To improve convergence, we incorporate the additive Schwarz acceleration as a preconditioning step following each policy update. Numerical experiments demonstrate the effectiveness of Schwarz acceleration in improving computational efficiency.